Title : Reinforcement Learning via Convex Duality

Ofir Nachum

Abstract :
We review basic concepts of convex duality and summarize how this duality may be applied to a variety of reinforcement learning (RL) settings, including policy evaluation or optimization, online or offline learning, and discounted or undiscounted rewards. The derivations yield a number of intriguing results, including the ability to perform policy evaluation and on-policy policy gradient with behavior-agnostic offline data and methods to learn a policy via max-likelihood optimization. The results we will derive will yield both new algorithms as well as new perspectives on old algorithms. By providing a unified treatment and perspective on these results, we hope to enable researchers to better use and apply the tools of convex duality to make further progress in RL.